Skip to content

ML-As-2

Point Estimation

The Poisson distribution is a useful discrete distribution which can be used to model the number of occurrences of something per unit time. For example, in networking, packet arrival density is often modeled with the Poisson distribution. If X is Poisson distributed, i.e., X Poisson(λ), its probability mass function takes the following form:

P(X|λ)=λXeλX!

It can be shown that E(X)=λ. Assume now we have n i.i.d. data points from Poisson(λ):D=X1,,Xn . (For the purpose of this problem, you can only use the knowledge about the Poisson and Gamma distributions provided in this problem.)

(a)

Show that the sample mean λ^=1nΣi=1nXi is the maximum likelihood estimate (MLE) of λ and it is unbiased (Eλ^=λ).

Finding the MLE

L(λ)=i=1nP(Xi|λ)=i=1nλXieλXi!lnL(λ)=i=1n(Xilnλλln(Xi!))ddλlnL(λ)=i=1n(Xiλ1)=0i=1nXi=nλλ^=1ni=1nXi

Unbiasedness

E(λ^)=E(1ni=1nXi)

Since  Xi  are i.i.d., we can take the expectation inside the sum:

E(λ^)=1ni=1nE(Xi)=1ni=1nλ=nλn=λ

Therefore,  E(λ^)=λ, confirming that λ^  is an unbiased estimator of  λ . of λ^

(b)

Now let's be Bayesian and put a prior distribution over λ. Assuming that λ follows a Gamma distribution with the parameters (α,β) , its probability density function:

p(λ|α,β)=βαΓ(α)λα1eβλ

Where Γ(α)=(α1)! (here we assume α is a positive integer). Compute the posterior distribution λ.

P(θ|λ)=P(X|λ)P(λ|α,β)P(X)P(θ|λ)P(X|λ)P(λ|α,β)=λXeλX!βαΓ(α)λα1eβλP(θ|λ)λX+α1eλ(β+1)

Let α=X+α , β=β+1 Then the distribution is still a Gamma distribution

(c)

Derive an analytic expression for the maximum a posterior (MAP) of λ under Gamma(α,β) prior.

MAP(λ)=i=1nP(Xi|λ)=i=1nP(Xi|λ)P(λ)P(X)i=1nP(Xi|λ)P(λ)i=1nP(Xi|λ)P(λ)logi=1nP(Xi|λ)P(λ)logP(λX)log(ni=1P(Xiλ)P(λ))i=1nlogP(Xiλ)+logP(λ)i=1nlogP(Xiλ)+logP(λ)

Prior Distribution P(λ)

P(λ|α,β)=βαΓ(α)λα1eβλlogP(λ|α,β)(α1)logλβλ

Likelihood function P(Xi|λ)

P(X|λ)=λXeλX!logP(Xiλ)Xilogλλi=1nlogP(Xiλ)+logP(λ)
MAP(λ)i=1nXilogλnλ+(α1)logλβλ=logλ(i=1nXi+α1)λ(n+β)ddλlogP(λ|X)=i=1nXi+α1λ(n+β)=0λMAP=i=1nXi+α1n+β

Source of Error: Part 1

(a)

The bias of an estimator is defined as E[μ^]μ

The bias is 1μ

The variance of an estimator is defined as Var(μ^)=E[(μ^E[μ^2])]

Var(μ^)=0

This is not a good estimator, since the bias is large when the true value of μ is not 1. Usually we don’t have any information about the true value of μ, so it is unreasonable to assume it is equal to 1.

(b)

E(μ^)=μ the bias is 0. This is an unbiased estimator. The variance of this estimator is Var(μ^)=Var(y1)=1

This is not a good estimator since its variability does not decrease with the sample size.

(c)

2i=1n(yiμ)+2λμ=0μ^=1n+λiyi=nn+λy¯E[μ^]=1n+λE[iyi]=nn+λμ

Bias of the estimator :

bias=λμn+λ

Variance of the estimator :

Var(μ^)=Var(1n+λiyi)=1(n+λ)2iVar(yi)=n(n+λ)2σ2

Source of Error: Part 2

(a)

(b)

The error is equal to 0.

Because p(X|Y=0) and p(X|Y=1) do not overlap.

Just check whether it is in the interval [-4,-1] or in the interval [1,4]

(c)

P[error]=P[x[0,1]]×P[error|x[0,1]]=(P[x[0,1]|y=0]P[y=0]+P[x[0,1]|y=1]P[y=1])×P[error|x[0,1]]=(14×12+14×12)×12=18

(d)

  • E[X|Y=0]=2.5 and Var[X|Y=0]=34 (using the variance formula for the uniform distribution),
  • E[X|Y=1]=2.5 and Var[X|Y=1]=34.

Since we are approximating p(X|Y) using a normal distribution, we have:

  • p^(X|Y=0)=N(2.5,0.75),
  • p^(X|Y=1)=N(2.5,0.75).

Using these, for x<0, we find p^(X|Y=0)>p^(X|Y=1), and for x>0, p^(X|Y=0)<p^(X|Y=1). Therefore, the classifier will make no error in classifying new points.

(e)

Given a finite amount of data, we will not learn the mean and variance of p(X|Y) perfectly. Therefore, the classifier's error will increase due to the limited data. In this scenario, we would have both bias and error in our model.

Gaussian (Naïve) Bayes and Logistic Regression

No, the new P(Y|X) is no longer the form used by logistic regression.

P(Y=1|X)=P(Y=1)P(X|Y=1)P(Y=1)P(X|Y=1)+P(Y=0)P(X|Y=0)=11+P(Y=0)P(X|Y=0)P(Y=1)P(X|Y=1)=11+exp(lnP(Y=0)P(X|Y=0)P(Y=1)P(X|Y=1))=11+exp(ln1ππ+lnP(X|Y=0)P(X|Y=1))=11+exp(ln1ππ+ilnP(Xi|Y=0)P(Xi|Y=1))

The log ratio of class-conditional probabilities:

ilnP(Xi|Y=0)P(Xi|Y=1)=iln12πσi0exp((Xiμi0)22σi02)12πσi1exp((Xiμi1)22σi12)

Simplifies to:

=ilnσi1σi0+i((Xiμi1)22σi12(Xiμi0)22σi02)=ilnσi1σi0+iσi02σi122σi02σi12Xi2+2(μi0σi12μi1σi02σi02σi12)Xi+μi12σi02μi02σi122σi02σi12

Probability of P(Y=1|X):

P(Y=1|X)=11+exp(ln1ππ+ilnP(Xi|Y=0)P(Xi|Y=1))

Simplifies to:

P(Y=1|X)=11+exp(w0+iwiXi+iviXi2)w0=ln1ππ+i(lnσi1σi0+μi12σi02μi02σi122σi02σi12)wi=μi0σi12μi1σi02σi02σi12vi=σi02σi122σi02σi12